Fix FLOPs inconsistency and add using_sparse_model flag by Devesh-Maheshwari · Pull Request #21743 · Lightning-AI/pytorch-lightning

Devesh-Maheshwari · 2026-05-26T18:27:33Z

What does this PR do?

The FLOPs values in _CUDA_FLOPS for Hopper GPUs were inconsistent: h100 sxm and h100 pcie used dense per-GPU peaks, while h100 nvl and h200 sxm1 used the headline "with-sparsity" values from NVIDIA's datasheet. This caused a 2× difference in MFU when switching between GPU types.

Changes

Data fix (standardize Hopper on dense):

h100 nvl — replaced sparse values (which were duplicates of h100 sxm's sparse numbers) with the correct H100 NVL dense per-GPU peaks: 30 / 60 / 417.5 / 835.5 / 835.5 / 1670.5 TFLOPS.
h200 sxm1 — tensor-core entries now stored as dense halves (494.5 / 989.5 / 989.5 / 1979 TFLOPS) instead of sparse headline values.
h100 sxm — rounded entries to exact datasheet ÷ 2 (sub-1% drift only).
Sources verified against H100 and H200 datasheets.

API addition (per agreed design in issue thread):

Throughput.__init__ now accepts using_sparse_model: Optional[bool] = None and sparse_cuda_acceleration_factor: float = 2.0.
None (default): no scaling, emits rank_zero_warn so users explicitly choose. True: scales available_flops by the factor. False: dense default, no warning.
Forwarded automatically through ThroughputMonitor via **kwargs.

Tests

Four new tests in tests/tests_fabric/utilities/test_throughput.py:

test_throughput_sparse_model_scaling — constructor scales correctly for True / False / None-peak / invalid factor.
test_throughput_sparse_model_warning — warning fires only when unset + peak known; silenced by explicit True/False or available_flops=None.
test_throughput_sparse_model_mfu — end-to-end: MFU halves when sparse flag enabled with default 2× factor.
test_throughput_monitor_sparse_model — ThroughputMonitor forwards both kwargs into Throughput.

All existing tests in the file still pass (37 passing + 5 pre-existing xfails unrelated to this change).

codecov-commenter · 2026-05-27T01:09:41Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87%. Comparing base (932b7e3) to head (19b8e9c).
⚠️ Report is 4 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #21743   +/-   ##
=======================================
  Coverage      87%      87%           
=======================================
  Files         270      270           
  Lines       23973    23978    +5     
=======================================
+ Hits        20748    20753    +5     
  Misses       3225     3225

deependujha

Hi @Devesh-Maheshwari, thanks for the great work. Added couple of reviews, please have a look. Thanks :)

Co-authored-by: Deependu <deependujha21@gmail.com>

deependujha

thanks for the great work. :)

deependujha

please update changelog file too.

deependujha · 2026-06-01T11:12:37Z

Thanks @Devesh-Maheshwari, LGTM. :)

Copilot

Pull request overview

This PR addresses inconsistent GPU peak FLOPs values used for MFU calculations (notably for Hopper GPUs) and introduces an explicit API to opt into sparsity-accelerated peak FLOPs when computing MFU in Fabric’s throughput utilities.

Changes:

Standardizes Hopper entries in _CUDA_FLOPS to use dense peak FLOPs (and normalizes FLOPs constants to consistent scientific notation).
Adds using_sparse_model: Optional[bool] and sparse_cuda_acceleration_factor to Throughput.__init__, with optional scaling of available_flops when sparsity is enabled.
Adds tests for sparse scaling/warnings/MFU behavior and updates the Fabric changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`src/lightning/fabric/utilities/throughput.py`	Adds sparsity-aware FLOPs scaling/warning to `Throughput` and corrects/normalizes `_CUDA_FLOPS` entries (especially Hopper).
`tests/tests_fabric/utilities/test_throughput.py`	Adds tests covering sparse scaling, warning behavior, MFU impact, and `ThroughputMonitor` kwarg forwarding.
`src/lightning/fabric/CHANGELOG.md`	Documents the new `Throughput` parameters and `_CUDA_FLOPS` changes (needs wording adjustment).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Devesh-Maheshwari added 2 commits May 26, 2026 01:16

added arguments

3a45223

Fix Hopper FLOPs inconsistency and add using_sparse_model flag

6ac9f57

Devesh-Maheshwari requested review from ethanwharris, justusschock and tchaton as code owners May 26, 2026 18:27

Devesh-Maheshwari changed the title ~~git push --set-upstream origin fix-flop-inconsistency~~ Fix Hopper FLOPs inconsistency and add using_sparse_model flag May 26, 2026

Merge branch 'master' into fix-flop-inconsistency

ed10cee

deependujha reviewed May 27, 2026

View reviewed changes

Update src/lightning/fabric/utilities/throughput.py

7e2bda6

Co-authored-by: Deependu <deependujha21@gmail.com>

deependujha approved these changes May 27, 2026

View reviewed changes

Devesh-Maheshwari added 2 commits May 27, 2026 13:39

made _CUDA_FLOPS consistent with a*10^b so it is consistent

8f09c93

resolved the reviewers comments

bb8c49d

Devesh-Maheshwari changed the title ~~Fix Hopper FLOPs inconsistency and add using_sparse_model flag~~ Fix FLOPs inconsistency and add using_sparse_model flag May 27, 2026

deependujha requested changes May 28, 2026

View reviewed changes

Added changed logs

9f98086

deependujha requested a review from Copilot June 1, 2026 11:08

Copilot started reviewing on behalf of deependujha June 1, 2026 11:08 View session

Merge branch 'master' into fix-flop-inconsistency

2485332

deependujha approved these changes Jun 1, 2026

View reviewed changes

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread tests/tests_fabric/utilities/test_throughput.py

Comment thread src/lightning/fabric/utilities/throughput.py

Comment thread src/lightning/fabric/CHANGELOG.md Outdated

deependujha added 2 commits June 1, 2026 16:54

update

94dc34c

update

19b8e9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix FLOPs inconsistency and add using_sparse_model flag#21743

Fix FLOPs inconsistency and add using_sparse_model flag#21743
Devesh-Maheshwari wants to merge 10 commits into
Lightning-AI:masterfrom
Devesh-Maheshwari:fix-flop-inconsistency

Devesh-Maheshwari commented May 26, 2026

Uh oh!

codecov-commenter commented May 27, 2026 •

edited

Loading

Uh oh!

deependujha left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deependujha left a comment

Uh oh!

deependujha left a comment

Uh oh!

deependujha commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Devesh-Maheshwari commented May 26, 2026

What does this PR do?

Changes

Tests

Uh oh!

codecov-commenter commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

deependujha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deependujha left a comment

Choose a reason for hiding this comment

Uh oh!

deependujha left a comment

Choose a reason for hiding this comment

Uh oh!

deependujha commented Jun 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 27, 2026 •

edited

Loading